
ENHANCING THE PERFORMANCE OF CODING SYSTEMS THAT USE HIGH FREQUENCY 
RECONSTRUCTION METHODS 

5 TECHNICAL FIELD 

The present invention relates to digital audio coding systems that employ high frequency reconstruction 
(HFR) methods. It enables a more consistent core codec perfbimance, and improved audio quality of the 
combined core codec and HFR system is achieved, 
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BACKGROUND OF THE INVENTION 

Audio source coding techniques can be divided into two classes: natural audio coding and speech coding. 
Natural audio coding is commonly med for music or arbitrary signals at mediuxu bitrates. Speech codecs 
are basically limited to speech reproduction, but can on tlie other hand be used at very low bit rates. In 

15 both' classes, the signal is generally separated into two major signal components, a spectral envelope and a 
corresponding residual signal. Codecs that make use of such a division exploit the fact that the spectral 
envelope can be coded much more efficiently than the residual. In systems where high fi-equency 
reconstaiction methods are used, no residual corresponding to the htghband is transniitted. Instead, a 
highband is generated at the decoder side from the lo wband covered by the core codec, and shaped to 

20 obtain tiie desired highband spectral envelope. In double-ended HFR systems, envelope data 

corresponding to the upper frequency range is transmitted, whereas in single-ended HFR systems the 
highband envelope is derived from the lowband. In either case, prior art audio codecs apply a time 
invariant crossover frequency between the core codec frequency range and tbe HFR frequency range. 
Thus, at a given bitxate, the crossover frequency is selected such that a good trade-off between core codec 

25 introduced artifacts; and HFR system introduced artifacts is achieved for typical programme material 

Clearly, such a static setting may be far from the optimum for a particular signal: The core codec is either 
overstressed, resulting in higher tlian necessary lowband artifacts, which inherent to the HFR method also 
degrades the highband quality, or not used to its full potential, i.e. a larger than necessary HFR frequency 
range is employed. Hence, the maximum performance of the joint coding system is only occasionally 

30 reached by prior art systems. Furthermore, the possibility to align the crossover to transitions between 
regions with disparate spectral properties, such as tonal and noise like regions, is not exploited. 



SUMMARY OF THE INVENTION 

The present inventioii provides a new method and an apparatus for improvement of coding systems where 
high firequency reconstruction methods (HFR) are used. Hie invention parts from the traditional usage of 
a fixed crossover frequency between the lowband^ where conventional coding schemes (such a& MPEG 
Layer-3 or AAQ are used, and the highband, where HFR coding schemes are used, by continuos 
estimatiori and application of the crossover frequency that yields the optimum tradeoff between artifacts 
introduced by the lowband codec and the HFR system respectively. According to the invention, the 
choice can be based on a measure of the degree of difficulty of encoding a signal with the core codec, a 
short-time bit demand detection, and a spectral tonality analysis, or any combination thereof. The 
measure of dif&culty can be derived from the perceptual entropy, or the psychoacoustically relevaxft core 
codec distortion. Since the optimum choice changes frequently over tLme, the application of a variable 
crossover frequency results in a substantially improved audio quality, which also is less dependent on 
program material characteristics. The invention is applicable to single-ended and double-ended HFR- 
systems. 

BRIEF DESCKimON OF THE DRAWINGS 

The present invention will jxovj be described by way of illustrative examples, not limiting the scope or 
spirit of the invention, with reference to the accompanying drawiags, in which: 

Fig. 1 is a graph that illustrates the terms lowband, highbaud and crossover frequeticy. 
Fig, 2 is a graph that illustrates a cote codec workload measure, 

Fig. 3 is a graph that illustrates short time bit-demand variations of a constant bitrate codec. 

Fig. 4 is a graph that illustrates division of a signal into tonal and noise-like frequency ranges. 

Fig. 5 is a block diagram of an HFR-bascd encoder, enlianced by a crossover frequency control module. 

Fig. 6 is a block diagram, which illustrates the crossover frequency control module in detail. 

Fig. 7 is a block diagram of the corresponding HFR-based decoder. 



DJISCRIPTIOIN OF PREFERJRED EMBODIMENTS 

The below-described embodiments are merely illustrative for the principles of the present invej^tion. It is 
understood that modifications arid variations of ^he arrangements and the details described herein will be 
apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the 
impending patent claims and not by the i?pecific details presented by way of desCTipUon and explanation 
of the embodiments herein. 

In a system where the lowband or low frequency range^ 101 as given in Fig. 1 , is encoded by a core codec 
and the highband or high frequency range, 102, is covered by a suitable HFR method^ the border between 
ihe two ranges can be defined as the crossover frequency, 103. Stace the oxcoding schemes operate on a 
block-wise frame by frame basis, one is free to chaj?ge the crossover frequency for every processed 
frame. According to the present invention, it is possible to setup a detection algorithm that adapts the 
crossover frequency sueh that the optimum quality for the combined codipg system is achieved. The 
implementation thereof is hereinafter referred to a-? the crossover frequency control module. 

Taldng into accomit tliat the audio quality of the core codec is also the basis for the quality of the 
reconstructed higbbandj, it is obvious that a good and constant audio quality in the lowband rasige is 
desired. By lowering the crossover frequency, the frequency range that the core codec has to cope with is 
smaller, and tlius easier to encode. Thns> by measuring the degree of difficulty of encoding a frame and 
adjusting the crossover frequency accordmgly, a more constant audio quality of the core encoder can be 
achieved* 

As an example on how to measure the degree of difScuity, the perceptual entropy [ISO/IEC 1 3S18-7, 
Annex B-2.1] may be used: Here a psychoacoustic model based on a spectral analysis is applied. Usually 
the spectral lines of the analysis filter bank are grouped into bands, where the number of lines withia a 
band depends on the band center-fi-equency and is chosen accordii^g to the well-known bark scale^ aiming 
at a perceptually constant frequKQCy resolution for all bands. By usiag a psychoacoustic model that 
exploits effects such as spectral or temporal masking, thresholds of audibility for every band is obtained. 
The perceptual entropy within a band is then given by 
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and 

2 = spectral line index within cmrent band 

s(i) = spectral value of line; 

L{b) ~ number of lines in current band 

t{b) - psychoacoustic thxeshold for Current band 

b = band index 

/ = number of lines in current band such that r(i) > 1 .0 
and only terms such that r(i) > 1.0 are used in flie summation. 



By Summing up the perceptual entropies of all bands that have to be cod^d in the low band frequency 
range, a measure of the encoding difficulty for the current frame is obtained. 

A similar approach is to calculate tlie distortion energy at the end of the core codec encoding process by 
smtiming up the distortion energy of every band according to 



where 

«(&) = 

and 



n^id)-m for n^ib}/tib)>l.Q 
0 otherwise 



^qi^) ^ quantization noise energy 
t{h) = psychoacoustic threshold 
b = band index 
B - number of bands 



Furthermore, the distortion energy may be weighted by a loudness curve, in order to weight the actual 
distortion to its psychoacoustic relevance. As an c^tample, the summation in Eq. 2 can be modified to 

«'., = Z(«(^))"-'' (■£'9-3) 

A-0 



where a simplification of a loudness function according to Zwicker is used [^'PsychoaCoustics", Eberhaid 
Zwicker and Hugo Fasti, Springer-Verlag^ Berlin 1990), 



An encoding difficulty or workload measure can then be defined as a function of the total distortion. 
Fig. 2 gives an example of the distortion energy of a perceptual audio codec, and a corresponding 
workload measure, where a non-linear recursion has been used to calculate the worMoad. It can he 
observed that the workload shows high deviations over time and is dependent on the input material 
5 characteristics. 

ffigh perceptual entropy or high distortion energy indicates that a signal is psychoacoustically hard to 
code at a limited bitrate, and audible artifacts m the lowband are likely to appear. In this case the 
crossover frequency control module shall signal to use a lower crossover frequency in order to make it 

10 easier for the perceptual audio encoder to cope with the given signal. Concurrently, low perceptual 

entropy or low distortion energy indicates an easy-to-code signal. Thus the crossover frequency shall be 
chosen higher in order to allow a wider frequency range for the low band, thereby reducing artifacts that 
are likely to be introduced in the highband due to the limited capabilities of any existing HFR method. 
Both approaches also allow usage of an analysis-by-synthesis approach by re^encodmg the current frame 

15 if an adjustment of the crossover frequency has been signaled in the analysis stage. However, since 

overlapping transforms are used in most $tate-of-die-ait audio codecs, the performance of the system may 
be improved by applying a smoothing of the analysis input parameters over time, in order to avoid too 
frequent switching of the crossover frequency, which could cause blocking eSects. If the actual 
implementation does not need to be optimized in terms of processing delay, the detection algorithm can 

20 be further improved by using a larger look-ahead in time^ offering the possibility to find points in time 
where shifts can be done with a minimum of switching artifacts. Non-realtime applications represent a 
special case of this, where the entire frle to be encoded can be analyzed, if desired- 

In the case of a constant bit rate (CBR) audio codec, a short time bit-demand variation analysis may be 
25 used as an additional input parameter in the crossover decision: State-of-the-an audio encoders such as 
MPEG Layer-3 or MPEG-2 AAC use a bit reservoir technique in order to compensate for short time peak 
bit-demand deviations from die average number of available bits per frame. The fullness of such a bit 
reservoir indicates whether the core encoder is able to cope weU with an upcoming difficult-to-encode 
frame or not, A practical example of the -aumber of used bits per frame, and the bit reservoir fullness over 
30 time is given in Fig. 3 . Thus, if the bit resetvoir frillness is high, the core encoder will be able to handle a 
difficult frame and there is no need to choose a lower crossover frequency. Concurrently, if the bit 
reservoir fullness is low^ the resulting audio quality may be substantially improved in the following 
frames by lowering the crossover frequency, in order to reduce the core encoder bit demand, such that the 
bit reservoir can be filled up due to the smaller frequency range ihat has to be encoded. Again, a large 
35 look-ahead can improve the detection method since the behavior of the bit reservoir fullness may be 
predicted well in advance. 
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Besides tbe ejacoding difficulty of the current frame, another important parameter to base the choice of 
the crossover frequency on is described as follows: A large mimber of audio signals such as speech or 
some musical instrumeiits show the property that the spectral range can be divided into a pitched or tonal 
range and a noise-like range. Fig. 4 shows the spectrum of an audio input signal where this properly is 
5 clearly evident. Using tonality and/or noise analysis methods m the spectral domain, two ranges may be 
detected, which can be classified as tonal and noise-like respectively. The tonality can be calculated as 
given for example in the AAC-standard [ISO/IEC 13S18-7:1997(EX PP- 96-98, section B2A A "Steps in 
threshold calcnl'ation"]- Other well-koown tonality or nOise detection algorilirras such as spectral flatness ■ 
measure are also suited for the purpose. Thus the crossover frequency between these ranges is used as the 
10 crossover frequency in fhe context of the present invention ia order to better separate the tonal and noise 
like spectral range and feed fhem separately to the core encoder, respectively the HFR method. Hence the 
overall audio quality of the combined codec system can be substantially improved in such cases. 

Clearly^ the above methods are applicable to double-^nded and single-ended KFR-systems alUce. La the 
15 latter case, only a lowband of varying bandwidth, encoded by the core codec is tranismitted. The HFR 

decoder then extrapolates an envelope from the lowband cutoff frequency and upwards. Furthennore, the 
present invention is applicable to systems where the highband is generated by arbitrary methods different 
to the one that is used for coding of the lowband, 

20 Adapting the HFR start frequency to the varying bandvvidth of the lowband signal would be a vciy 
tedious task when applying conventional transposition methods such as frequency translation. Those 
methods generally involve filtering of the lowband signal to extract a lowpass or bandpass signal that 
subsequently is modulated in the time domain, causing a frequency shift. Thus, an adaption would 
incorporate switching of lowpass or bandpass filters and changes in the modulation frequency, 

25 Furthermore, a change of filter causes discontinuities in the output signal, which impels the use of 

windowing techniques.. However, in a filterbank-bascd system, the filtering is automatically achieved by 
extraction of subband signals from a set of consecutive frlterbands. An equivalent to the time domain 
modulation is then obtained by means of repatching of the extracted subband signals within the filterbank. 
The repatching is easily adapted To the varying crossover frequency, and the aforementioned windowing 

30 is inherent in the subband domain, so the change of txanslation parameters is achieved at little additional 
complexity. 

Fig. 5 shows an example of the encoder side of an HFR-based codec, enhanced according to the present 
invention. The analogue input signal is fed to an A/D-converter 501, forming a digital signal The digital 
35 audio signal is fed to a core encoder 502, where source coding is performed. In addition, the digital 

signal is fed to an HFR envelope encoder 503 . The output of the HFR envelope encoder represents the 
envelope data covering the highband 102 starting at the crossover frequency lOS as illustrated in Fig, 1. 
The number of bits that is needed for the envelope data in the envelope encoder is passed to the core 
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encoder in order to be subtracted from the total available bits for a given frame. The core encoder will 
then encode the remaining lowband frequency range up to the crossover frequency. As taught by the 
prescint invention, a crossover frequency control module 504 is added to the encoder. A time- and/or 
frequency-domam representation of the input signal, as well as core codec status signals is fed to the 
crossover frequency control module. The output of the module 504, in form of the optimum choice of the 
crossover frequency, is fed to core and envelope encoders in order to signal the frequency ranges that 
shall be encoded. The frequency range for each of the two coding schemes is also encoded, for example 
by an efficient table lookup scheme. If the frequency range between two subsequent frames does not . , 
change, this can be signaled by one single bit in order to keep the bitrate overhead as small as possible. 
Hence the frequency ranges do not have to be transmitted explicitly in every frame. The encoded data of 
both encoders is then fed to the multiplexer, fomnng a serial bit stream that is transmitted or stored. 

Fig. 6 gives an example of subsystems within the crossover frequency control module 504, and 601 
respectively. An encoder workload measure analysis module 602 explores how difficult the current 
frame is to code for the core encoder, using for example the pisrceptual entropy or the distortion energy 
approach as described above. Provided that the core codec employs a bit reservoir, a buffer frdlness 
analysis module may be included, 603. A tonality a:aalysis module, 604, signals a target crossover 
frequraicy corresponding to the tonal/noise transition frequency when applicable. All input paxaxKetcrs to 
the joint decision module 606 are combined and balanced according to the actual implementation of the 
used core- and HFR-codecs when calculating the crossover frequency to use, in order to obtain the 
maximum overall performance. 

The correspondnig decoder side is shown in Fig, 7. The demultiplexer 701 separates the bitstream signals 
into core codec data^ which is fed to the core decoder 702, envelope data, which is fed to the HFR 
envelope decoder 703. The core decoder produces a signal covering the lowband frequency range. 
Similarly, the HFR envelope decoder decodes the data into a representation of the spectral envelope for 
the highband frequency range. The decoded envelope data is then fed to the gain control module 704, 
The low band signal from the cort decoder is routed to the transposition module 705, which, based on the 
crossover frequency, generates a replicated highband signal from the lowband. The highband signal is 
fed to the gain control module in order to adjust the highband spectral envelope to that of the transmitted 
envelope. The output is thus an envelope adjusted highband audio signal. This signal is added to the 
output from the delay unit 706, which is fed with the lowband audio signal whereas the delay 
compensates for the processing time of the highband signal. Finally, the obtained digital wideband signal 
is converted to an analogue audio signal in the D/A-converter 707. 



